Load the Required Libraries
> library(easypackages)
> libraries("dplyr", "reshape2", "readxl", "ggpubr","stringr", "ggplot2",
+ "tidyverse","lme4", "data.table", "readr","plotly", "DT",
+ "pheatmap","asreml", "VennDiagram", "patchwork", "heatmaply",
+ "ggcorrplot", "RColorBrewer", "hrbrthemes", "tm", "proustr", "arm",
+ "gghighlight", "desplot", "gridExtra", "TeachingDemos", "scales", "ASExtras4",
+ "FactoMineR", "corrplot", "factoextra", "asremlPLUS")This section shows the analysis of filtered phenotypic data in ASReml R package. The filtered data set was obtained after pre-processing and Quality check of data
In this section, data analysis will be shown only for grain yield trait using a Linear Mixed-Model Approach in ASReml-R package.
Demo data analysis for grain yield will also be shown using freely available lme4 R Package package, and will be useful to the users who do not have access to the commercial ASReml-R package. See the another HTML file for analysis in lme4 R package.
In general analysis pipeline is divided in two parts:
We will be testing various mixed models correcting for experimental design factors (Blocks and Replications) and spatial trends in the field.
The best model will be identified and selected (model having lowest AIC value ) to extract the BLUPs or predicted means and Heritability.
Just a note, BLUPs are best for phenotypic selection, users can also extract the BLUEs for treating genotypes as Fixed
As mentioned above five mixed models will be used to account for experimental design factors and accounting for spatial trends in field.
The five models shown here are for demo purpose, more models can be used to model the phenotypic data. For more information on these models and other advanced additional mixed models is available here: Asreml-R-Tutorial: Go to section 4.1; Book: Genetic Data Analysis for Plant and Animal Breeding; Chapter 7.
Description of Models
Model 1
In this model we account for just experimental design factor Block and no spatial effects.
Note we used block as random effect and replication as fixed effect (due to less than 5 degrees of freedom). If you are interested to know whether to use block fixed or random in model I highly recommend this Blocks Fixed or Random?
\[ y_{ijk}= \mu+g_{i} + r_{j}+ b_{jk} + \epsilon_{ijk}\\ y_{ijk}= \text{ is the effect of $i$th genotype in $j$th replication and $k$th block within the $j$th replication} \\ \mu= \text {overall mean}\\ g_{i}=\text{random effect of the $i$th genotype}\\ r_{j}=\text{fixed effect of the $j$th replication}\\ b_{jk}= \text {random effect of $k$th block nested within $j$ replication}\\ \varepsilon_{ijk}=\text{residual error}\\ \text{here we assume errors are independent and identically distributed }\epsilon\sim \text{$iid$N}(0,\sigma_\epsilon^2)\\ \]
R script in Asreml
model1<-asreml(fixed=trait~Rep, random=~Genotype+Rep:Block, na.method=“include”, data=data)
Model 2
\[ y_{ijklm}= \mu+g_{i} + r_{j}+ b_{jk} + c_{l}+ ro_{m}+\epsilon_{ijklm}\\ y_{ijklm}= \text{ is the effect of $i$th genotype in $j$th replication, $k$th block (within the $j$th replication) in $l$th column and $m$th row} \\ \mu= \text {overall mean}\\ g_{i}=\text{random effect of the $i$th genotype}\\ r_{j}=\text{fixed effect of the $j$th replication}\\ b_{jk}= \text {random effect of $k$th block nested with $j$ replication}\\ c_{l}=\text {random effect of $l$th column}\\ ro_{m}=\text {random effect of $m$th row}\\ \varepsilon_{ijklm}=\text{residual error}\\ \text{here we assume residuals are independent and identically distributed }\epsilon\sim \text{$iid$N}(0,\sigma_\epsilon^2)\\ \]
R script in Asreml
model2<-asreml(fixed=trait~Rep, random=~Genotype+Rep:Block+Column+Row, na.method=“include”, data=data)
Model 3
\[ y_{ijk}= \mu+g_{i} + r_{j}+ b_{jk} + \epsilon_{ijk}\\ y_{ijk}= \text{ is the effect of $i$th genotype in $j$th replication and $k$th block within the $j$th replication} \\ \mu= \text {overall mean}\\ g_{i}=\text{random effect of the $i$th genotype}\\ r_{j}=\text{fixed effect of the $j$th replication}\\ b_{jk}= \text {random effect of $k$th block nested within $j$ replication}\\ \epsilon_{ijk}=\text {residual error}\\ \]
here, we assume \(\epsilon\) is a random effect that represents correlated residuals based on the distance between plots along both the rows and columns, where, \(\epsilon\sim N(0,\mathbf{R})\) and R is the covariance matrix of \(\epsilon\). The difference between this model and model 1 and model 2 described above is the structure of the covariance residuals \(R ={\sigma_\epsilon^2\ \Sigma}_c\left(\rho_c\right)\otimes\Sigma_r\left(\rho_r\right)\). \(\sigma_\epsilon^2\) is the variance of spatially dependent residual; \({\Sigma}_c\left(\rho_c\right)\) and \(\ \Sigma_r\left(\rho_r\right)\) represents the first-order autoregressive correlation matrices and \(\rho_{c\ }\) and \(\rho_{ro\ }\) are the autocorrelation parameters for the columns and rows; \(\otimes\) represents the Kronecker product between separable auto-regressive processes of the first order in the row-column dimensions. For more details on this, these references would be helpful Gilmour et al., 1997; Gogel et al., 2018; Andrade et al., 2020; Bernardeli et al.202
R script in Asreml
model3<-asreml(fixed=trait~Rep, random=~Rep:Block+Genotype,residual =~ar1v(Block):ar1(Column), na.method=“include”, data=data)
Model 4
\[ y_{ijk}= \mu+g_{i} + r_{j}+ b_{jk} + \epsilon_{ijk}\\ y_{ijk}= \text{ is the effect of $i$th genotype in $j$th replication and $k$th block within the $j$th replication} \\ \mu= \text {overall mean}\\ g_{i}=\text{random effect of the $i$th genotype}\\ r_{j}=\text{fixed effect of the $j$th replication}\\ b_{jk}= \text {random effect of $k$th block nested within $j$ replication}\\ \epsilon_{ijk}=\text {residual error}\\ \]
here, we assume \(\epsilon\) is a random effect that represents correlated residual across rows only, where, \(\epsilon\sim N(0,\mathbf{R})\) and R is the covariance matrix of \(\epsilon\), and \(\mathbf{R}={\sigma_\epsilon^2\ \Sigma}_c\left(\rho_c\right)\otimes I_r\). \(\sigma_\epsilon^2\) is the variance of spatially dependent residual; \({\Sigma}_c\left(\rho_c\right)\) represents the first-order autoregressive correlation matrices and \(\rho_{ro\ }\) the autocorrelation parameters for the rows only; \(I_r\) represents independently and identically distributed variance structure for rows.
R script in Asreml
model4<-asreml(fixed=trait~Rep, random=~Genotype+Rep:Block,residual =~ar1(Row):idv(Column), na.method=“include”, data=data)
Model 5
\[ y_{ijk}= \mu+g_{i} + r_{j}+ b_{jk} + \epsilon_{ijk}\\ y_{ijk}= \text{ is the effect of $i$th genotype in $j$th replication and $k$th block within the $j$th replication} \\ \mu= \text {overall mean}\\ g_{i}=\text{random effect of the $i$th genotype}\\ r_{j}=\text{fixed effect of the $j$th replication}\\ b_{jk}= \text {random effect of $k$th block nested within $j$ replication}\\ \epsilon_{ijk}=\text {residual error}\\ \]
here, we assume \(\epsilon\) is a random effect that represents correlated residual across columns only, where, \(\epsilon\sim N(0,\mathbf{R})\) and R is the covariance matrix of \(\epsilon\), and \(\mathbf{R}={\sigma_\epsilon^2\ \Sigma}_c\left(\rho_c\right)\otimes I_r\). \(\sigma_\epsilon^2\) is the variance of spatially dependent residual; \({\Sigma}_c\left(\rho_c\right)\) represents the first-order autoregressive correlation matrices and \(\rho_{c\ }\) the autocorrelation parameters for the columns only; \(I_r\) represents independently and identically distributed variance structure for rows.
R script in Asreml
model5<-asreml(fixed=trait~Rep, random=~Genotype+Rep:Block,residual =~idv(Row):ar1(Column), na.method=“include”, data=data))
> rm(list=ls()) # Remove previous work
> # Read the saved csv file, if not present then read it from working directory
> if(exists('demo.data.filtered') && is.data.frame(get('demo.data.filtered'))){
+ demo.data.filtered=demo.data.filtered
+ }else{
+ demo.data.filtered<-read.csv(file="~/Documents/Analysis-pipeline/Outputs/Tables/demo.data.filtered.csv",
+ header = TRUE)
+ # Factor conversion
+ columns<-c("Environment", "Genotype", "Rep", "Block", "Row", "Column", "Line.type")
+ demo.data.filtered[, columns]<-lapply(columns, function(x) as.factor(demo.data.filtered[[x]]))
+ demo.data.filtered$Yield<-as.numeric(demo.data.filtered$Yield)
+ demo.data.filtered$HT<-as.numeric(demo.data.filtered$HT)
+ demo.data.filtered$DTF<-as.numeric(demo.data.filtered$DTF)
+ }
> # Subset the required columns
> demo.data.filtered<-demo.data.filtered[, c("Environment", "Genotype", "Rep", "Block",
+ "Row", "Column", "Line.type", "Yield", "HT", "DTF")]
>
> # First we will arrange the rows and columns for spatial analysis.
> demo.data.filtered<-data.frame(demo.data.filtered%>% group_by(Environment)%>%arrange(Row, Column)) # arrange by row and column
> demo.data.filtered<-data.frame(demo.data.filtered%>% arrange(Environment)) # Arrange by environment
> #demo.data.filtered<-demo.data.filtered[!demo.data.filtered$Environment %in% c("Env2", "Env5","Env8", "Env9"), ]In this section we will run all the Five models described above and extract the model with lower AIC values in each environment. More on AIC and BIC values can be found here Click here.
We will use for loop to run all the five models across all the environments
> # Run the for loop to extract the AIC values
> demo.data.filtered$Environment<- as.character(demo.data.filtered$Environment)
> un.en<- unique(demo.data.filtered$Environment)
> # models<-c("model1", "model2", "model3", "model4", "model5")
> AIC<-data.frame()
> for(i in 1:length(un.en)){
+ sub<- droplevels.data.frame(demo.data.filtered[which(demo.data.filtered$Environment==un.en[i]),])
+ # Accounting blocks and replications
+ model1<-asreml(fixed=Yield~Rep, random=~Genotype+Rep:Block,
+ na.method="include", data=sub)
+ aic.model1<- -2 *model1$loglik +2 *length(model1$vparameters)
+ # Model 2: Accounting blocks, replications and rows and columns.
+ model2<-asreml(fixed=Yield~Rep, random=~Genotype+Column+Row,
+ na.method="include", data=sub)
+ aic.model2<- -2 *model2$loglik +2 *length(model2$vparameters)
+ # Model 3: Accounting block, Replication, and for spatial trends rows and columns.
+ model3<-asreml(fixed=Yield~Rep, random=~Genotype+Rep:Block,
+ residual =~ar1v(Row):ar1(Column), na.method="include", data=sub)
+ aic.model3<- -2*model3$loglik +2*length(model3$vparameters)
+ # Model 4: Accounting for experimental design factors and spatial trends.
+ model4<-asreml(fixed=Yield~Rep, random=~Genotype+Rep:Block,
+ residual =~ar1(Row):idv(Column), na.method="include", data=sub)
+ aic.model4<- -2*model4$loglik +2*length(model4$vparameters)
+ # Model 5: In this model we account for spatial trends across columns only.
+ model5<-asreml(fixed=Yield~Rep, random=~Genotype+Rep:Block,
+ residual =~idv(Row):ar1(Column), na.method="include", data=sub)
+ aic.model5<- -2*model5$loglik +2*length(model5$vparameters)
+ AIC<-data.frame(model1=aic.model1, model2=aic.model2, model3=aic.model3,
+ model4=aic.model4, model5=aic.model5, Environment=un.en[i])
+ if(i>1){
+ AIC.all<-rbind(AIC.all, AIC)
+ }
+ else{
+ AIC.all<-AIC
+ }
+ #models<-list( model1, model2, model3, model4, model5)
+ #all.plots<-list(plot.model1, plot.model1,plot.model1, plot.model1,plot.model1)
+ #return(AIC.all)
+ }> # Round the AIC values
> AIC.all<-data.frame(lapply(AIC.all, function(y) if(is.numeric(y)) round(y, 2) else y))
> # View as table
> print_table <- function(table, ...){
+ datatable(table, extensions = 'Buttons',
+ options = list(scrollX = TRUE,
+ dom = '<<t>Bp>',
+ buttons = c('copy', 'excel', 'pdf', 'print')), ...)
+ }
>
> print_table(AIC.all, editable = 'cell',
+ rownames = FALSE, caption = htmltools::tags$caption("Table: Showing the models with AIC values in all the environments.",style="color:black; font-size:130%"), filter = 'top')> row.names(AIC.all)<-AIC.all$Environment
> AIC.all<-AIC.all[,-6]
> #AIC.all<-as.matrix(AIC.all)
> AIC.min<-as.data.frame(apply( AIC.all, 1, which.min))
> colnames(AIC.min)<-"AIC.value"
> datatable(AIC.min)The second column shows which model is best in each environment.
Here in this section we will select and run the best model and extract BLUPs (know more on BLUPs or BLUEs here).
Best model will be selected based on lower AIC values and also residual plot. Lower the AIC value better is the model. For example for grain yield under environment 1 Model 5 has lower AIC values and will be used to extract the BLUPs
We will also calculate the heritability’s. Note we are dealing with complex models so we will not use basic formula as: \(h{^2}= \frac{\sigma^{2}g}{\sigma^{2}g+\sigma^{2}e}\) to calculate entry mean based heritability. Alternative methods have been proposed to genralize the heritability and is very well described in this paper click here
Two alternative methods ( based on variance of a difference between genotypes) can be used to calculate heritability:
The above two methods are more appropriate in breeding and selection of genotypes to calculate heritability through estimating genotype differences ratherthan the precision of the genotype effect estimates. Further this definition of heritability is related to reliability of breeding value predictions. For more details please check the Book: Genetic Data Analysis for Plant and Animal Breeding; Chapter 7; Paper, and this beautiful resource Summary of heritability equations
> demo.data.filtered$Environment<- as.character(demo.data.filtered$Environment)
> un.en<- unique(demo.data.filtered$Environment)
> #models<-c("model1", "model2", "model3", "model4", "model5")
> AIC<-data.frame()
> for(i in 1:length(un.en)){
+ sub<- droplevels.data.frame(demo.data.filtered[which(demo.data.filtered$Environment==un.en[i]),])
+ # Model 5 is best in Env1
+ if (sub$Environment=="Env1") {
+ model<-asreml(fixed=Yield~Rep, random=~Genotype+Rep:Block,residual =~idv(Row):ar1(Column),
+ na.method="include", data=sub)
+ # Model 5 is best in Env6
+ } else if(sub$Environment=="Env6"){
+ model<-asreml(fixed=Yield~Rep, random=~Genotype+Rep:Block,residual =~idv(Row):ar1(Column),
+ na.method="include", data=sub)
+ # Model 3 is best in Env7
+ } else if (sub$Environment=="Env7") {
+ model<-asreml(fixed=Yield~Rep, random=~Genotype+Rep:Block,residual =~ar1v(Row):ar1(Column), na.method="include", data=sub)
+ # Model 3 is best in Env7
+ }else if (sub$Environment=="Env10") {
+ model<-asreml(fixed=Yield~Rep, random=~Genotype+Rep:Block,residual =~ar1v(Row):ar1(Column), na.method="include", data=sub)
+ } else {
+ # In rest of environments model 1 was best
+ model<-asreml(fixed=Yield~Rep, random=~Genotype+Rep:Block, na.method="include", data=sub)
+ }
+ vc.g <- summary(model)$varcomp['Genotype','component']
+ vc.g
+ # Mean variance of a difference of two genotypic BLUPs
+ vdBLUP.mat <- predict(model, classify="Genotype", sed=TRUE)$sed^2
+ vdBLUP.avg <- mean(vdBLUP.mat[upper.tri(vdBLUP.mat, diag=FALSE)])
+ vdBLUP.avg
+
+ # H2 Cullis method only #
+ H2Cullis <- 1 - (vdBLUP.avg/(vc.g*2))
+ H2Cullis
+ heritability<-data.frame(H2Cullis, Environment=un.en[i])
+ #Extract BLUPS
+
+ predicted.gy<-predict(model, "Genotype", sed=T)$pvals
+ predicted.gy<-data.frame(predicted.gy, Environment=un.en[i])
+ if(i>1){
+ Reliability<-rbind(Reliability,heritability)
+ predicted.all<-rbind(predicted.all, predicted.gy)
+ }
+ else{
+ Reliability<- heritability
+ predicted.all<- predicted.gy
+ }
+ }> # Round the values
> predicted.all<-data.frame(lapply(predicted.all[,-4],
+ function(y) if(is.numeric(y)) round(y, 2) else y))
> print_table(predicted.all,editable = 'cell', rownames = FALSE,
+ caption = htmltools::tags$caption("BLUPs along with standard errors for grain yield in all environments.", style="color:black; font-size:130%"), filter = 'top')> ggbarplot(Reliability, x = "Environment", y = "H2Cullis",
+ fill = "lightblue", # change fill color by mpg_level
+ color = "black",
+ merge = TRUE,# Set bar border colors to white
+ palette = "jco", # jco journal color palett. see ?ggpar
+ #sort.by.groups =FALSE,
+ x.text.angle = 90, # Rotate vertically x axis texts
+ ylab = "Reliability",
+ xlab = FALSE,
+ rotate=FALSE,
+ x.text.col = TRUE,
+ legend = "top",
+ ggtheme =theme_classic() ,
+ font.legend = 18,
+ #legend.title = "Treatment"
+ )+
+ font("xlab", size = 25, color = "black")+
+ font("ylab", size = 25, color = "black")+
+ font("xy.text", size = 12, color = "black")>
> # Save the file for analysis
> write.csv(predicted.all, file = "~/Documents/Analysis-pipeline/Outputs/Tables/BLUPs.all.csv",
+ row.names = FALSE) > # Let us subset the environment 1
> env1.data<-subset(demo.data.filtered, Environment=="Env1")
> env1.data<-droplevels.data.frame(env1.data)
> # Now run the model that came out best
> model<-asreml(fixed=Yield~Rep, random=~Genotype+Rep:Block,
+ residual =~idv(Row):ar1(Column),
+ na.method="include", data=env1.data)> summary(model)$varcomp
component std.error z.ratio bound %ch
Rep:Block 2.327246e+04 1.854688e+04 1.254791 P 0.1
Genotype 4.052657e+05 5.101289e+04 7.944379 P 0.0
Row:Column!R 1.000000e+00 NA NA F 0.0
Row:Column!Row 2.075041e+05 2.327241e+04 8.916313 P 0.0
Row:Column!Column!cor 3.446818e-01 7.614203e-02 4.526828 U 0.1> summary(model)$bic
[1] 5526.948
attr(,"parameters")
[1] 4
> summary(model)$aic
[1] 5511.063
attr(,"parameters")
[1] 4> wald(model)
[0;34mWald tests for fixed effects.[0m
[0;34mResponse: Yield[0m
[0;34mTerms added sequentially; adjusted for those above.[0m
Df Sum of Sq Wald statistic Pr(Chisq)
(Intercept) 1 5100.2 5100.2 <2e-16 ***
Rep 1 0.2 0.2 0.621
residual (MS) 1.0
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1> plot(varioGram(model5))> # Blups with Standard error
> blups <- predict(model5, classify = "Genotype")$pvals
Model fitted using the sigma parameterization.
ASReml 4.1.0 Fri Sep 17 12:01:17 2021
LogLik Sigma2 DF wall cpu
1 -2991.615 1.0 391 12:01:18 0.1
2 -2991.615 1.0 391 12:01:18 0.0
3 -2991.615 1.0 391 12:01:18 0.0
> # Average standard error of difference
> avgsed<- predict(model5, classify = "Genotype")$avsed
Model fitted using the sigma parameterization.
ASReml 4.1.0 Fri Sep 17 12:01:18 2021
LogLik Sigma2 DF wall cpu
1 -2991.615 1.0 391 12:01:18 0.0
2 -2991.615 1.0 391 12:01:18 0.0
3 -2991.615 1.0 391 12:01:18 0.0> # Get varaince of genotypes
> vc.g <- summary(model)$varcomp['Genotype','component']
> vc.g
[1] 405265.7
> # Mean variance of a difference of two genotypic BLUEs
> vdBLUP.mat <- predict(model, classify="Genotype", sed=TRUE)$sed^2 # obtain squared s.e.d. matrix
Model fitted using the sigma parameterization.
ASReml 4.1.0 Fri Sep 17 12:01:18 2021
LogLik Sigma2 DF wall cpu
1 -2751.531 1.0 392 12:01:18 0.0
2 -2751.531 1.0 392 12:01:18 0.0
3 -2751.531 1.0 392 12:01:18 0.0
> vdBLUP.avg <- mean(vdBLUP.mat[upper.tri(vdBLUP.mat, diag=FALSE)]) # take mean of upper triangle
> vdBLUP.avg
[1] 157575.8
> # Heritability H2 Cullis
> H2Cullis <- 1 - (vdBLUP.avg/(vc.g*2))
> H2Cullis
[1] 0.8055895In this section, we will show how to analyze multi-environment data. More information on this can be found here: Paper1, Paper2.
The MET data analysis can be divided into ways: 1) One-step or Single stage Approach and 2) Two-Step or Stage-wise Approach. More on this can be found here Paper1, Paper2, Paper3, Paper4, and Paper5
Single-stage analysis is golden standard to analyze the mutli-environment data. However, in experiments or trials with unbalanced data, different experimental designs across trials and to avoid computational challenges of analyzing huge trials together-two-stage analysis is more appropriate. In two stage approach adjusted means as BLUEs are estimated per location/trial and weighted BLUEs (associated variance-covariance matrix) are fitted in second step to get the BLUPs or predicted means for each genotype Mohring and Piepho.2009.
Here both the Single-Stage and Stage-wise approach for the MET analysis will be demonstrated.
\[ y_{ijkl}= \mu+g_{i} + e_{j}+ (ge)_{ij}+r_{jk}+ b_{jkl} +\epsilon_{ijkl}\\ \mu= \text {overall mean}\\ g_{i}=\text{random effect of the $i$th genotype}\\ e_{j}=\text{random effect of the $j$th environment}\\ (ge)_{ij}=\text{is the interaction effect of $i$th genotypes with the $j$th environment}\\ r_{jk}=\text{fixed effect of the $k$th replication nested within $j$th environment}\\ b_{jkl}= \text {random effect of $l$th block nested with $j$ environment and $k$th replication}\\ \epsilon_{ijkl}=\text{residual error}\\ \text{here we assume residuals are independent and identically distributed}\\ \]
> rm(list=ls())
> # Read the saved csv file, if working directly
> if(exists('demo.data.filtered') && is.data.frame(get('demo.data.filtered'))){
+ demo.data.filtered=demo.data.filtered
+ }else{
+ demo.data.filtered<-read.csv(file="~/Documents/Analysis-pipeline/Outputs/Tables/demo.data.filtered.csv",
+ header = TRUE)
+ # factor conversion if below are not in factors
+ columns<-c("Environment", "Genotype", "Rep", "Block", "Row", "Column", "Line.type")
+ demo.data.filtered[, columns]<-lapply(columns, function(x) as.factor(demo.data.filtered[[x]]))
+ demo.data.filtered$Yield<-as.numeric(demo.data.filtered$Yield)
+ demo.data.filtered$HT<-as.numeric(demo.data.filtered$HT)
+ demo.data.filtered$DTF<-as.numeric(demo.data.filtered$DTF)
+ }
> # Subset the required columns
> demo.data.filtered<-demo.data.filtered[, c("Environment", "Genotype", "Rep", "Block",
+ "Row", "Column", "Line.type", "Yield", "HT", "DTF")]
> # First we will arrange the rows and columns for spatial analysis.
> # Now we will subset the environments and Yields for analysis
> demo.data.filtered<-data.frame(demo.data.filtered%>% group_by(Environment)%>%arrange(Row, Column)) # arrange by row and column
> demo.data.filtered<-data.frame(demo.data.filtered%>% arrange(Environment)) # Arrange by environment
>
> #demo.data.filtered<-demo.data.filtered[!demo.data.filtered$Environment %in% c("Env2", "Env5","Env8", "Env9"), ]Here we will run the 10 linear mixed models which ranges from basic model to advanced factor analytical models.
The description of these models is given in the manuscript.
We will use the best model (selected based on lower AIC and BIC values) to extract the results.
> # Model 1: Basic model accounting for only environments and no interactions
> met1 <-asreml(Yield~Rep, random =~Genotype+Environment+Rep:Block,
+ na.method="incude",data = demo.data.filtered)
>
> # Model 2: Basic model accounting G x E interactions
> met2 <-asreml(Yield~Rep, random =~Genotype+Environment+ Genotype*Environment+Rep:Block,
+ na.method="incude",data = demo.data.filtered)
>
> # Model 3: Different variances across Environments i.e., heterogeneous error variances across environments
> met3<-asreml(fixed=Yield~Rep,
+ random=~Genotype+Rep:Block+Genotype:Environment,
+ na.method="incude", residual = ~ dsum( ~ units|Environment),
+ data=demo.data.filtered)
> # Model 4: Modelling G x E with spatial trends (same for all environments)
> met4<-asreml(fixed=Yield~Rep,
+ random=~Genotype+Environment+Rep:Block,
+ residual = ~dsum(~ar1(Row):ar1(Column) |Environment),
+ #residual =~ar1v(Row):ar1(Column),
+ na.method="incude", data=demo.data.filtered)
> # Model5: Modelling G x E with spatial trend specific to each environment
> met5<-asreml(fixed=Yield~Rep,
+ random=~Genotype+Environment+Rep:Block,
+ residual =~dsum(~idv(Row):ar1(Column) +ar1v(Row):ar1(Column)+ar1(Row):idv(Column)|Environment,
+ levels = list(c(6,7), c(8,2), c(1,3,4,5,9,10))),
+ na.method="incude", data=demo.data.filtered)
> # Model 6: Heterogeneous genetic variance-each environment has a unique genetic variance with no correlations between environments
> met6 <- asreml(fixed=Yield~Rep,
+ random =~diag(Environment):Genotype+Rep:Block,
+ residual=~dsum(~id(units)|Environment),
+ data = demo.data.filtered)
> # Model 7:Heterogeneous genetic variance accounting common genetic correlations
> met7<- asreml(fixed=Yield~Rep+Environment,
+ random =~corgh(Environment):Genotype+Rep:Block,
+ residual=~dsum(~id(units)|Environment),
+ data = demo.data.filtered)
> # Model 8: Factor Analytically model (fa1)
> met8 <- asreml(fixed=Yield~Rep,
+ random =~diag(Environment):Rep:Block+fa(Environment,2):Genotype,
+ residual=~dsum(~id(units)|Environment),
+ data = demo.data.filtered)
> # Model 9: Factor Analytical model (fa3)
> met9 <- asreml(fixed=Yield~Rep,
+ random =~diag(Environment):Rep:Block+fa(Environment,3):Genotype,
+ residual=~dsum(~id(units)|Environment),
+ data = demo.data.filtered)
> # Model 10: Factor analytical model (fa3) with specific spatial trends
> met10<-asreml(Yield~Rep,
+ random =~diag(Environment):Rep:Block+fa(Environment,3):Genotype,
+ residual =~dsum(~idv(Row):ar1(Column) +ar1v(Row):ar1(Column)+ar1(Row):idv(Column)|Environment,
+ levels = list(c(6,7), c(8,2), c(1,3,4,5,9,10))),
+ data=demo.data.filtered)In this section we will use infoCriteria.asreml function to extract the AIC and BIC values for each model.
Then we will apply it to all the models as list and identify the best model based on lower BIC value.
> # Create the list of all models
> library(asremlPlus)
> all.models<-list(met1,met2, met3, met4, met5, met6, met7, met8, met9, met10)
> # Extract the AIC and BIC values
> all.bic<-lapply(all.models,function(x){infoCriteria.asreml(x)})
> # Save it as data.frame
> all.bic<-as.data.frame(do.call(rbind, all.bic) )
> # Find which model has minimum BIC value
> all.bic[which.min(all.bic$BIC),]
fixedDF varDF NBound AIC BIC loglik
8 0 49 1 56894.78 57202.61 -28398.39Note: Model 8 (Met8) comes out the best model, thus will be used in downstream analysis to extract the results
> # Get all the variance components
> varcomp<-summary(met8)$varcomp
> # Extract the error variance
> error.vars<-varcomp[grep("!R", rownames(varcomp)),"component", drop = F]
> error.vars$Environment<-substr(rownames(error.vars), start = 13, stop = 17)
> error.vars$Environment<-gsub("!","",as.character(error.vars$Environment))
> row.names(error.vars)<-error.vars$Environment> genotyp.var<-varcomp[grep("var", rownames(varcomp)),"component", drop = F]
> #remove the leading characters
> genotyp.var$Environment<-substr(rownames(genotyp.var), start = 29, stop = 33)
> genotyp.var$Environment<-gsub("!","",as.character(genotyp.var$Environment))
> row.names(genotyp.var)<-genotyp.var$Environment> # Combine genotype and phenotypic variances
> varinace.all<-merge(genotyp.var,error.vars, by="Environment" )
> colnames(varinace.all)<-c("Environment", "geno.var", "error.var")
> # Calculate the Heritability
> varinace.all$heritability<-varinace.all$geno.var/(varinace.all$geno.var+varinace.all$error.var)> # Put data in wide format
> varinace.data<- gather(varinace.all, Variance, Value, geno.var:error.var, factor_key=TRUE)
> # Plot the variance graph
> var.plot<-ggplot(varinace.data,
+ aes(x = Environment, y = Value, fill=Variance)) +
+ geom_bar(stat="identity", colour="black") +
+ ggtitle("Error variance at each Environment") +
+ ylab(label = "Error Variance") +
+ # theme(axis.text.x=element_text(angle = 90, hjust = 0))
+ theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
+ panel.background = element_blank(), axis.line = element_line(colour = "black"))
> # Get heritability plot
> heritability.plot<-ggplot(varinace.all,
+ aes(x = Environment, y = heritability)) +
+ geom_bar(stat="identity", colour="black", fill="lightblue") +
+ ggtitle("Heritability at each Environment") +
+ ylab(label = "Error Variance") +
+ # theme(axis.text.x=element_text(angle = 90, hjust = 0))
+ theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
+ panel.background = element_blank(), axis.line = element_line(colour = "black"))
> grid.arrange(var.plot, heritability.plot, nrow = 1)In this section we will use library ASExtras4 to extract the additional results including the regression plots and BLUPs.
We will use function fa.asreml to extract the results. It will return list of 7. For more details click here
> # Get the summary of results from Factor Analytical model
> summary.all<-fa.asreml(met8)
> # Get BLUPs across all Environments (G xE)
> blups<-summary.all$blups$`fa(Environment, 2):Genotype`
> blups.GE<-blups$blups[,-4]
> # Reshape the G x E BLUPs
> biplot.blups<-reshape(blups.GE, idvar = "Genotype", timevar = "Environment", direction = "wide")
> colnames(biplot.blups) <- gsub(x = colnames(biplot.blups),
+ pattern = "blup.", replacement = "")
> # Get PCA on the Biplot
> # Let us work with principle component analysis
> data_pca<-biplot.blups[-1]
> data_pca<- PCA(data_pca, graph = FALSE, scale.unit = TRUE)
> #data_pca <- prcomp(data_pca, scale = TRUE)
> fviz_pca_biplot(data_pca, palette = "jco", geom = "point",
+ #col.ind = blups.names$Type,
+ #pointshape = 21,
+ pointsize = 2,
+ #col.var = factor(c("UMMT", "GBB", "BPT", "RCB", "LNM", "MNM")),
+ addEllipses = TRUE, label = "var",
+ col.var = "blue", repel = TRUE,
+ legend.title = "Group")+
+ xlim(-5, 5) + ylim (-5, 5)> # First let us get the BLUPs for G x E effects
> blups.GE<-blups$blups[,-4]
> # factor conversion
> blups.GE$Environment<-as.factor(blups.GE$Environment)
> blups.GE$Genotype<-as.factor(blups.GE$Genotype)
> # First get top ten genotypes
> top.blups<-blups.GE%>%
+ arrange(desc(blup)) %>% slice(1:10)
> top.blups<-droplevels.data.frame(top.blups)
> # Get names of top genotypes
> top.genotypes<-c(top.blups$Genotype)
> # Now subset the data frame
> blups.GE.top<-blups.GE[blups.GE$Genotype%in%top.blups$Genotype, ]
> # Now get the Factor loadings
> loadings<-data.frame(summary.all$gammas$`fa(Environment, 2):Genotype`$`rotated loads`)
> loadings$Environment<-row.names(loadings)
> # Now merge with top blups file
> top.blups.final<-merge(blups.GE.top, loadings, by="Environment", all = TRUE)
> colnames(top.blups.final)<-c("Environment", "BLUPs", "Genotype", "Factor1",
+ "Factor2")
> # Plot the laten regression plots
> ggplot(top.blups.final, aes(Factor1, BLUPs)) +
+ geom_point(color="darkred") +
+ geom_smooth(method="lm") +
+ facet_wrap(~ Genotype)+
+ #theme_few()+ #use white theme
+ labs(title="Latent Regression Plot",x="Factor Loading 1", y = "BLUP")+
+ theme_light ()+
+ theme (plot.title = element_text(color="black", size=16, face="bold", hjust=0.5),
+ axis.title.x = element_text(color="black", size=16),
+ axis.title.y = element_text(color="black", size=16)) +
+ theme(axis.text= element_text(color = "black", size = 10))+
+ theme(legend.position="none")+
+ #facet_grid(~ timepoint,ncol=4,scales = "free")+
+ theme(strip.text.x = element_text(size = 10,face="bold",colour = "black"))+ #adding theme and background to headings
+ theme(strip.background = element_rect(fill = "lightblue", color = "black", size=1.5))The figure shows the latent regression plots for top 10 genotypes. From the figure it is obvious that genotypes 127, 139, 146, 175, and 22 are high yielding and stable genotypes across the all the environments
> # We will use again summary.all results to extract the Correlationship matrix
> Corr.matrix<-summary.all$gammas$`fa(Environment, 2):Genotype`$Cmat
> # Get the genotype matrix
> G.matrix<-summary.all$gammas$`fa(Environment, 2):Genotype`$Gmat
> #corrplot(Corr.matrix, type = "upper", order = "hclust",
> #tl.col = "black", tl.srt = 45)
> pheatmap(Corr.matrix, cluster_rows = FALSE, cluster_cols = FALSE, scale = "none",
+ legend_labels = "-log10(p-value)", fontsize = 12, angle_col = 90)> # Get the Predicted Values for MET
> blups.met<-predict(met8, "Genotype", sed=T)$pvals
Multi-section model using the sigma parameterization.
ASReml 4.1.0 Fri Sep 17 12:02:05 2021
LogLik Sigma2 DF wall cpu
1 -28398.24 1.0 3953 12:02:05 0.1
2 -28398.19 1.0 3953 12:02:05 0.0
3 -28398.01 1.0 3953 12:02:06 0.0
4 -28397.77 1.0 3953 12:02:06 0.0
5 -28397.49 1.0 3953 12:02:06 0.0
6 -28397.16 1.0 3953 12:02:06 0.0
7 -28396.81 1.0 3953 12:02:06 0.0 (1 restrained)
8 -28396.53 1.0 3953 12:02:06 0.0 (1 restrained)
9 -28396.47 1.0 3953 12:02:06 0.0 (1 restrained)
10 -28396.47 1.0 3953 12:02:06 0.1 (1 restrained)
11 -28396.47 1.0 3953 12:02:06 0.1
12 -28396.47 1.0 3953 12:02:06 0.1
> # Plot the Histogram of MET BLUPs
> ggplot(data=blups.met, aes(predicted.value)) +
+ geom_histogram(color="darkblue", fill="lightblue")+ # adjust x values and breaks
+ geom_vline(aes(xintercept=mean(predicted.value)), # adding the line to represent mean
+ color="darkred", linetype="dashed", size=1)+
+ labs(title="",x="Value", y = "Count")+
+ theme_classic()+
+ theme (plot.title = element_text(color="black", size=14, face="bold", hjust=0),
+ axis.title.x = element_text(color="black", size=10, face="bold"),
+ axis.title.y = element_text(color="black", size=10, face="bold")) +
+ theme(axis.text= element_text(face = "bold", color = "black", size = 8))> # Arrange the BLUPs in decreasing order
> blups.met<-blups.met%>%arrange(desc(predicted.value))
> # Select top 35 and merge with checks
> blups.top50<-data.frame(blups.met[1:50, ])
> blups.names<-merge(blups.top50[, c(1,2)],biplot.blups,by="Genotype", all=TRUE)
> blups.names$Type<-ifelse(is.na(blups.names$predicted.value), "Un-selected", "Selected")
> # Now let us plot the biplot
> fviz_pca_biplot(data_pca, palette = "jco", geom = "point",
+ col.ind = blups.names$Type,
+ #pointshape = 21,
+ pointsize = 2,
+ #col.var = factor(c("Env1", "Env2", "Env3", "Env5", "Env6",
+ # "Env7","Env8", "Env9", "Env10", "Env11")),
+ addEllipses = TRUE, label = "var",
+ col.var = "blue", repel = TRUE,
+ legend.title = "Group")+
+ xlim(-6, 6) + ylim (-6, 6)Note: Most of the top genotypes selected are highly stable
\[ y_{ijk}= \mu+g_{i} + r_{j}+ b_{jk} + \epsilon_{ijk}\\ y_{ijk}= \text{ is the effect of $i$th genotype in $j$th replication and $k$th block within the $j$th replication} \\ \mu= \text {overall mean}\\ g_{i}=\text{random effect of the $i$th genotype}\\ r_{j}=\text{fixed effect of the $j$th replication}\\ b_{k}= \text {random effect of $k$th block nested with $j$ replication}\\ \epsilon_{ijk}=\text {residual error}\\ \]
here, we assume \(\epsilon\) is a random effect that represents correlated residuals based on the distance between plots along both the rows and columns, where, \(\epsilon\sim N(0,\mathbf{R})\) and R is the covariance matrix of \(\epsilon\) with \(R ={\sigma_\epsilon^2\ \Sigma}_c\left(\rho_c\right)\otimes\Sigma_r\left(\rho_r\right)\). \(\sigma_\epsilon^2\) is the variance of spatially dependent residual; \({\Sigma}_c\left(\rho_c\right)\) and \(\ \Sigma_r\left(\rho_r\right)\) represents the first-order autoregressive correlation matrices and \(\rho_{c\ }\) and \(\rho_{ro\ }\) are the autocorrelation parameters for the columns and rows; \(\otimes\) represents the Kronecker product between separable auto-regressive processes of the first order in the row-column dimensions.
> # Run the model for each environment
> demo.data.filtered$Environment<- as.character(demo.data.filtered$Environment)
> un.en<- unique(demo.data.filtered$Environment)
> # models<-c("model1", "model2", "model3", "model4", "model5")
> for(i in 1:length(un.en)){
+ sub<- droplevels.data.frame(demo.data.filtered[which(demo.data.filtered$Environment==un.en[i]),])
+ # Model 5 is best in Env1
+ if (sub$Environment=="Env1") {
+ model<-asreml(fixed=Yield~Rep, random=~Genotype+Rep:Block,residual =~ar1(Row):idv(Column),
+ na.method="include", data=sub)
+ # Model 5 is best in Env6
+ } else if(sub$Environment=="Env6"){
+ model<-asreml(fixed=Yield~Rep, random=~Genotype+Rep:Block,residual =~ar1(Row):idv(Column),
+ na.method="include", data=sub)
+ # Model 3 is best in Env7
+ } else if (sub$Environment=="Env7") {
+ model<-asreml(fixed=Yield~Rep, random=~Genotype+Rep:Block,residual =~ar1v(Row):ar1(Column), na.method="include", data=sub)
+ # Model 3 is best in Env7
+ }else if (sub$Environment=="Env10") {
+ model<-asreml(fixed=Yield~Rep, random=~Genotype+Rep:Block,residual =~ar1v(Row):ar1(Column), na.method="include", data=sub)
+ } else {
+ # In rest of environments model 1 was best
+ model<-asreml(fixed=Yield~Rep, random=~Genotype+Rep:Block, na.method="include", data=sub)
+ }
+ #Extract BLUPS
+ predicted.gy<-predict(model, "Genotype", sed=T)$pvals
+ predicted.gy<-data.frame(predicted.gy, Environment=un.en[i])
+ if(i>1){
+ predicted.all<-rbind(predicted.all, predicted.gy)
+ }
+ else{
+ predicted.all<- predicted.gy
+ }
+ }In the second stage, mixed model is fitted across each environment using the weighted adjusted means obtained from first stage as response variable.
The weighted BLUEs were used to take care of the heterogeneous error variance and weights were calculated by the inverse of the squared standard error of BLUEs.
The model used follows as:
\[ y_{ij}= \mu+g_{i} + e_{j}+ (ge)_{ij}+\epsilon_{ij}\\ y_{ij}= \text{ is the BLUE value for $i$th observation in $j$th environment}\\ \mu= \text {overall mean}\\ g_{i}=\text{the random effect of $i$th genotype} \\ e_{j}=\text{fixed effect of the $j$th environment}\\ (ge)_{ij}=\text{is interaction effect of $i$th genotype genotype with the $j$th genotype environment}\\ \epsilon_{ij}=\text{residual error}\\ \] Here, we assume \(g_{i} ∼N (0, \sigma_g^2)\) where \(\sigma_g^2\) is the genetic variance, and the error is know from the first stage which is give as \(\epsilon~N(0,R)\), where \(R\) is the diagonal matrix, in which its elements are given by the inverse of the residual variance adjusted means of genotypes in each environment. Further, we can assume the different covariance structures among the residuals and random effects as show in the single stage analysis.
> # First let us get weighted BLUEs
> wt<- predicted.all$std.error^2
> wt<- 1/wt
> #names(wt)<-"weights"
> # Check for missing and convert it to 0
> wt[is.na(wt)] = 0
> # Now fit the model with A matrix and extract Breeding values/BLUPs
> # Factor conversion
> # Now run the model, will use full matrix rather than inverse
> asreml.options(gammaPar = TRUE)
> predicted.all$Environment<-as.factor(predicted.all$Environment)
> model.com<-asreml(fixed= predicted.value ~1, random=~Genotype+Environment+Genotype*Environment,
+ workspace= 1000e06,weights=wt,
+ na.action=na.method(x="include"), data=predicted.all)
Online License checked out Fri Dec 3 12:11:40 2021
Model fitted using the gamma parameterization.
ASReml 4.1.0 Fri Dec 3 12:11:40 2021
Allocating main workspace...done!
LogLik Sigma2 DF wall cpu
1 -16076.85 35.8618 1999 12:11:46 0.0 (3 restrained)
2 -16073.12 35.7258 1999 12:11:46 0.0 (3 restrained)
3 -16017.31 33.7528 1999 12:11:46 0.0 (3 restrained)
4 -15533.13 20.5139 1999 12:11:46 0.0 (2 restrained)
5 -15005.31 10.5261 1999 12:11:46 0.0
6 -14717.16 4.2863 1999 12:11:46 0.0
7 -14570.95 1.5645 1999 12:11:46 0.0
8 -14502.00 0.6640 1999 12:11:46 0.0
9 -14459.75 0.3756 1999 12:11:46 0.0
10 -14431.71 0.2897 1999 12:11:46 0.0
11 -14416.11 0.2762 1999 12:11:46 0.0
12 -14409.43 0.2735 1999 12:11:46 0.0
13 -14407.11 0.2723 1999 12:11:46 0.0
> # Get the Breeding values (BLUPS)
> blups.com<- predict(model.com, classify='Genotype', pworkspace='6gb')$pvals
Model fitted using the gamma parameterization.
ASReml 4.1.0 Fri Dec 3 12:11:47 2021
Allocating main workspace...done!
LogLik Sigma2 DF wall cpu
1 -14406.56 0.271769 1999 12:11:58 4.9
2 -14406.53 0.271708 1999 12:11:58 0.0
3 -14406.50 0.271581 1999 12:11:58 0.0
4 -14406.50 0.271568 1999 12:11:58 0.0
5 -14406.50 0.271568 1999 12:11:58 0.0
> # Plot the BLUPs
> # Plot the Histogram of MET BLUPs
> ggplot(data=blups.com, aes(predicted.value)) +
+ geom_histogram(color="darkblue", fill="lightblue")+ # adjust x values and breaks
+ geom_vline(aes(xintercept=mean(predicted.value)), # adding the line to represent mean
+ color="darkred", linetype="dashed", size=1)+
+ labs(title="MET BLUPs across all Environments",x="Value", y = "Count")+
+ theme_classic()+
+ theme (plot.title = element_text(color="black", size=14, face="bold", hjust=0),
+ axis.title.x = element_text(color="black", size=10, face="bold"),
+ axis.title.y = element_text(color="black", size=10, face="bold")) +
+ theme(axis.text= element_text(face = "bold", color = "black", size = 8))> rm(list=ls())
> # Read the saved csv file, if working directly
> if(exists('demo.data.filtered') && is.data.frame(get('demo.data.filtered'))){
+ demo.data.filtered=demo.data.filtered
+ }else{
+ demo.data.filtered<-read.csv(file="~/Documents/Analysis-pipeline/Outputs/Tables/demo.data.filtered.csv",
+ header = TRUE)
+ # factor conversion if below are not in factors
+ columns<-c("Environment", "Genotype", "Rep", "Block", "Row", "Column", "Line.type")
+ demo.data.filtered[, columns]<-lapply(columns, function(x) as.factor(demo.data.filtered[[x]]))
+ demo.data.filtered$Yield<-as.numeric(demo.data.filtered$Yield)
+ demo.data.filtered$HT<-as.numeric(demo.data.filtered$HT)
+ demo.data.filtered$DTF<-as.numeric(demo.data.filtered$DTF)
+ }
> # Subset the required columns
> demo.data.filtered<-demo.data.filtered[, c("Environment", "Genotype", "Rep", "Block",
+ "Row", "Column", "Line.type", "Yield", "HT", "DTF")]
> # First we will arrange the rows and columns for spatial analysis.
> # Now we will subset the environments and Yields for analysis
> demo.data.filtered<-data.frame(demo.data.filtered%>% group_by(Environment)%>%arrange(Row, Column)) # arrange by row and column
> demo.data.filtered<-data.frame(demo.data.filtered%>% arrange(Environment)) # Arrange by environment
>
> #demo.data.filtered<-demo.data.filtered[!demo.data.filtered$Environment %in% c("Env2", "Env5","Env8", "Env9"), ]
> # Subset the Environment 1
> #demo.data.filtered<-subset(demo.data.filtered, Environment=="Env1")
>
> table(demo.data.filtered$Environment)
Env1 Env10 Env11 Env2 Env3 Env5 Env6 Env7 Env8 Env9
400 400 400 400 400 400 400 400 400 400 Here in this section we will provide an example how to fit the model when we have marker data available. We will show how to extract the breeding values.
We will show it just for one model using MET Data set. And same can be extended to other models.
First we will read the Marker Data. Then we will subset and match it with phenotype data to make sure we have same number of genotypes in both data sets
Then We will construct Genomic Relationship matrix[1] that will be used in the model
Finally we will extract the results.
Further, we will show example of single-step Genomic selection, in which non-genetic effects and genetic effects are modeled together. More details on on single-step and two-step GS can be found here: Resource 1, Resource 2, Resource 3, Resource 4.
> # Read Marker Data
> geno.data<-read.csv(file="~/Documents/Analysis-pipeline/Geno_Data/geno.data.csv", header=TRUE)
> # Subset it to match with phenotypic data
> row.names(geno.data)<-geno.data$Genotype
> geno.data<-as.matrix(geno.data[,-1])Here we will construct the Genomic Relationship Matrix (GRM) using marker data. The GRM will be based on VanRanden (2008).
The steps used to create this GRM is:
\[ GRM= XX^t/m \]
More on relationship matrix can be found here Source 1, Source2
> # Remove the columns that have all missing values
> geno.data<-geno.data[ , !apply( geno.data , 2 , function(x) all(is.na(x)) ) ]
> # Impute missing values
> for (j in 1:ncol(geno.data)) {
+ geno.data[, j] <- ifelse(is.na(geno.data[, j]), mean(geno.data[, j], na.rm = TRUE), geno.data[,
+ j])
+ }
> #Xs <- scale(geno.data, center = TRUE)
> # Construct G matrix
> GM <- geno.data %*% t(geno.data)/ncol(geno.data)Here we will use one of the model to show how to fit the relationship matrix based on markers and phenotypic data together.
Description of general model in the matrix notation is as follows:
\[ Y= X\beta +Zu++Z_{g}u_{g}+Z_{b}u_{rb}+Z_{E}u_{E}+\epsilon\\ Y= \text{ is a m x 1 vector of individual phenotypes}\\ X= \text{is a design matrix relating phenotypes to fixed effects of replications}\\ \beta = \text{ is a vector of fixed effects of replications }\\ Z_{g}= \text{is a design matrix of marker effects}\\ u_{g}= \text{is a vector of random marker effects}\\ Z_{rb}=\text{is a design matrix of non-genetic block effects nested within replications}\\ u_{rb}=\text{is a vector of random block effects}\\ Z_{e}=\text{ is a design matrix of non-genetic random effect of environments and interactions }\\ u_{e}=\text{ is a vector of main environment and interaction random effects}\\ \epsilon= \text{is matrix of residual/error effects }\\ \]
Further, we assume random effects are normally distributed with zero mean vectors and variance–covariance matrices as B, G, R as described in single-stage approach analysis of MET analysis above. Here, variance of \(var(u_{g})=\sigma^2_{g}G\), where G is genomic or kinship covariance matrix of n x m dimensions (n is no. of markers and m is no. of individuals) representing the genomic similarity of individuals.
> model.geno<-asreml(fixed=Yield~Rep+Environment,
+ random=~vm(Genotype, GM)+Rep:Block,
+ residual = ~dsum(~ar1(Row):ar1(Column) |Environment),
+ #residual =~ar1v(Row):ar1(Column),
+ na.method="incude", data=demo.data.filtered)
Online License checked out Sun Sep 26 21:06:07 2021
Multi-section model using the sigma parameterization.
ASReml 4.1.0 Sun Sep 26 21:06:07 2021
LogLik Sigma2 DF wall cpu
1 -29625.44 1.0 3953 21:06:07 0.1 (1 restrained)
2 -29445.99 1.0 3953 21:06:07 0.1 (1 restrained)
3 -29289.59 1.0 3953 21:06:07 0.0 (2 restrained)
4 -29235.13 1.0 3953 21:06:07 0.0 (3 restrained)
5 -29218.35 1.0 3953 21:06:07 0.1
6 -29213.31 1.0 3953 21:06:07 0.0
7 -29212.78 1.0 3953 21:06:07 0.0
8 -29212.78 1.0 3953 21:06:07 0.0> summary(model.geno)$varcom
component std.error z.ratio bound %ch
Rep:Block 7.245260e+02 1.639584e+03 0.4418961 P 0.2
vm(Genotype, GM) 2.519241e+05 4.454527e+04 5.6554624 P 0.1
Environment_Env1!R 5.906242e+05 4.456933e+04 13.2518064 P 0.0
Environment_Env1!Row!cor 6.085717e-02 5.318598e-02 1.1442334 U 0.0
Environment_Env1!Column!cor 1.120398e-01 5.225812e-02 2.1439690 U 0.0
Environment_Env10!R 1.058673e+06 8.121875e+04 13.0348330 P 0.0
Environment_Env10!Row!cor 1.037412e-01 5.442204e-02 1.9062350 U 0.0
Environment_Env10!Column!cor 2.207110e-01 5.008493e-02 4.4067346 U 0.0
Environment_Env11!R 3.984138e+05 3.060795e+04 13.0166788 P 0.0
Environment_Env11!Row!cor -2.624080e-02 5.709132e-02 -0.4596286 U 0.1
Environment_Env11!Column!cor 6.181523e-02 5.323077e-02 1.1612688 U 0.0
Environment_Env2!R 5.179114e+05 3.880642e+04 13.3460243 P 0.0
Environment_Env2!Row!cor -6.301728e-02 5.617729e-02 -1.1217571 U 0.0
Environment_Env2!Column!cor -6.443656e-02 5.347136e-02 -1.2050668 U 0.0
Environment_Env3!R 4.688076e+05 3.658144e+04 12.8154503 P 0.0
Environment_Env3!Row!cor 4.768287e-02 5.947694e-02 0.8017035 U 0.1
Environment_Env3!Column!cor 1.196510e-01 5.332558e-02 2.2437827 U 0.0
Environment_Env5!R 2.574117e+06 1.854055e+05 13.8837154 P 0.0
Environment_Env5!Row!cor 3.992404e-02 5.458791e-02 0.7313713 U 0.0
Environment_Env5!Column!cor -3.203248e-02 5.110729e-02 -0.6267692 U 0.0
[ reached 'max' / getOption("max.print") -- omitted 12 rows ]> plot(model.geno)> # Get the Breeding values (BLUPS)
> gBLUPs<- predict(model.geno, classify='Genotype', pworkspace='6gb')$pvals
Multi-section model using the sigma parameterization.
ASReml 4.1.0 Sun Sep 26 21:06:11 2021
LogLik Sigma2 DF wall cpu
1 -29212.78 1.0 3953 21:06:17 5.2
2 -29212.78 1.0 3953 21:06:17 0.1
3 -29212.78 1.0 3953 21:06:17 0.1> # check for global spatial trends and extraneous variations
> met.asreml(model.geno)Note: Other results can be extracted in similar way as as shown in the MET analysis of Single Stage Analysis
Analysis and Handling of G × E in a Practical Breeding Program
A stage‐wise approach for the analysis of multi‐environment trials
Experimental design matters for statistical analysis: how to handle blocking
Random effects structure for confirmatory hypothesis testing: Keep it maximal
Generalized linear mixed models: a practical guide for ecology and evolution
Perils and pitfalls of mixed-effects regression models in biology
A brief introduction to mixed effects modelling and multi-model inference in ecology
Modeling Spatially Correlated and Heteroscedastic Errors in Ethiopian Maize Trials
More, Larger, Simpler: How Comparable Are On‐Farm and On‐Station Trials for Cultivar Evaluation
Rethinking the Analysis of Non‐Normal Data in Plant and Soil Science
Fundamentals of Experimental Design: Guidelines for Designing Successful Experiments
Note: For questions specific to data analysiss shown here contact waseem.hussain@irri.org
If your experiment needs a statistician, you need a better experiment - Ernest Rutherford